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As the most studied post-translational modification, protein phosphorylation is analyzed in 
a growing number of proteomic experiments. These high-throughput approaches generate 
large datasets, from which specific spectrum-based information can be hard to find. In 
2007, the PhosPhAt database was launched to collect and present Arabidopsis phospho- 
rylation sites identified by mass spectrometry from and for the scientific community. At 
present, PhosPhAt 3.0 consolidates phosphoproteomics data from 19 published proteomic 
studies. Out of 5460 listed unique phosphoproteins, about 25% have been identified in 
at least two independent experimental setups. This is especially important when consid- 
ering issues of false positive and false negative identification rates and data quality (Durek 
etal., 2010). This valuable data set encompasses over 13205 unique phosphopeptides, with 
unambiguous mapping to serine (77%), threonine (17%), and tyrosine (6%). Sorting the 
functional annotations of experimentally found phosphorylated proteins in PhosPhAt using 
Gene Ontology terms shows an over-representation of proteins in regulatory pathways and 
signaling processes. A similar distribution is found when the PhosPhAt predictor, trained 
on experimentally obtained plant phosphorylation sites, is used to predict phosphoryla- 
tion sites for the Arabidopsis genome. Finally, the possibility to insert a protein sequence 
into the PhosPhAt predictor allows species independent use of the prediction resource. 
In practice, PhosPhAt also allows easy exploitation of proteomic data for design of further 
targeted experiments. 
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INTRODUCTION 

Protein post-translational modifications (PTMs) are one of the 
fastest processes through which plants respond to various stimuli. 
Thus, they increasingly became the focus of scientific stud- 
ies. Among the various PTMs, phosphorylation is one of the 
most studied modifications, due to the large number of affected 
proteins and its involvement in many cellular processes like sig- 
naling, nutrient uptake, and transport. In the mammalian field, 
the function of particular phosphorylation sites for activation 
or inactivation of proteins or as docking sites for interaction 
partners has been particularly well studied (Pawson and Gish, 
1992; Chung etal, 1999; Yaffe, 2002; Pawson, 2004). So far, 
most studies of protein phosphorylation in plant biology have 
been focused on phosphorylation of specific proteins and pro- 
tein families (Camoni etal., 2000; Hrabak etal., 2003) and the 
study of specific signaling pathways (Wang etal., 2005). However, 
more recently several unbiased large-scale studies of plant pro- 
tein phosphorylation have been carried out, comparing different 
physiological states or mutants (Li et al, 2009; Reiland et al., 2009; 
Nakagami etal., 2010), or by analyzing a time-course after stim- 
ulation (Niittyla etal, 2007; Chen etal, 2010; Kline etal, 2010; 
Engelsberger and Schulze, 2011). 

One characteristic of mass spectrometric analyses however is 
the generation of large datasets, which usually remain difficult 



to access for the general public. Even if identified peptides are 
listed, access to spectral data is often very limited. One way of 
allowing public access is the use of data repositories like Tranche 
(Smith etal, 2011), but even here much of the data is uploaded 
in raw and possibly vendor-specific format. This often results in 
time penalty and high workload when individual verification of 
only few specific peptides is required. As a consequence, already 
existing measurements are difficult to re-asses or use for different 
purposes. 

Thus, providing data storage and accessibility to this type of 
experimental information is of utmost importance. The Phos- 
PhAt database of plant phosphorylation sites including a plant 
phosphorylation site predictor provides such a resource. It aims to 
compile predicted and experimental evidence of protein phospho- 
rylation from large-scale proteomic studies with bioinformatics 
resources. Since its launch in 2007, the database has con- 
stantly been updated with new experimental evidence from the 
growing number of phosphoproteomic experiments (see list at 
http://phosphat.mpimp-golm.mpg.de/). 

Dynamic links to further external resources, such as Aramem- 
non (Schwacke etal., 2003), eFP browser (Winter etal., 2007), 
co-expression networks ATTED-II (Obayashi etal., 2009), and 
subcellular localization (Heazlewood etal., 2008) are imple- 
mented in PhosPhAt for each phosphoprotein. Additionally, the 



www.frontiersin.org 



June 2012 | Volume 3 | Article 132 | 1 



Arsova and Schulze 



PhosPhAt review 



phosphorylation prediction function implemented in PhosPhAt 
allows users to paste any given protein sequence into the pre- 
diction query window and obtain prediction of phosphorylation 
sites. Thus, the support vector machine-based predictor trained on 
Arabidopsis-specific phosphorylation sites can be used indepen- 
dently of the plant species (Durek etal, 2010). Recently, the 
PhosPhAt database itself has been integrated into a larger com- 
mon interface, the GATOR portal (Joshi et al., 201 1), which allows 
concurrent query of various proteomic resources. 

In its current form, the PhosPhAt database contains evidence 
for 12404 different phosphorylation sites mapping to 5460 differ- 
ent proteins and 94284 high confidence predicted sites mapping 
to 21764 proteins. We have found a significant overrepresentation 
of proteins involved in regulatory and signaling processes among 
the highly confident phosphorylated proteins, while housekeeping 
and other enzymatic functions are underrepresented (Heazlewood 
etal, 2008; Figure 1). 

The proteome-wide magnitude of protein phosphorylation 
becomes apparent when looking at the high confidence prediction 
of protein phosphorylation: mapping these predicted phospho- 
rylation sites to the number of proteins that are affected by 
phosphorylation, about 64% of the proteins listed in TAIR9 
(January 2010; Lamesch etal, 2012) are predicted to be phos- 
phorylated with high confidence (score >1). However, until now 
only about one-quarter of these predicted sites have been experi- 
mentally confirmed using mass spectrometry (Figure 1). Probably 
due to the focus of various proteomic studies on particular cellular 
compartments (i.e., plasma membrane, chloroplasts), larger num- 
bers of experimentally confirmed phosphorylated proteins have 
been found for particular functional categories (MapMan bins; 
Thimm et al, 2004). Examples of these include proteins with func- 
tions in photosynthesis (bin 1), glycolysis (bin 4), N-metabolism 
(bin 12), CI metabolism (bin 25), as well as microRNA and natural 
antisense-related proteins (bin 32). In other functional categories, 
the fraction of proteins with only predicted phosphorylation is 
very high, while often only one-third of the phosphorylated 
proteins has been identified experimentally (Figure 1). These 
include signaling functions (bin 30), cytoskeleton and vesicle 
trafficking (bin 31), as well as major carbohydrate metabolism 
(bin 2). This points not only to the versatility of processes in 
which phosphorylation plays a role, but also from experimen- 
tal point of view indicates remaining work to confirm predicted 
phosphorylations. 

The most challenging part, however, lies in the precise molec- 
ular characterization of the identified phosphorylation sites with 
regards to their effect on protein function. In this regards, current 
knowledge is still very limited. To this end, it is extremely valuable 
to study these experimentally determined phosphorylation sites 
and their role in specific physiological conditions, tissue types, or 
in the whole organism context. 

Thus, besides global analysis of protein phosphorylation and 
discovery of new phosphorylation sites, precise targeted studies 
of particular proteins of interest are necessary to finely elu- 
cidate the role of phosphorylation sites in particular proteins. 
This becomes especially important as protein phosphorylation 
functionally interacts with other protein modifications such as 
methionine oxidation (Hardin etal., 2009), lysine-acetylation 



(van Noort etal., 2012), and ubiquitination (Hunter, 2007; 
Thomas etal, 2009). 

Therefore, in this mini review, we aim at providing a detailed 
overview of the PhosPhAt features through a specific example for 
further utilization of the PhosPhAt resources in new experimental 
design of targeted phosphoprotein analysis. 

FUNCTIONS OF PhosPhAt RESOURCE 

The PhosPhAt web resource allows the user to search for exper- 
imental and predicted phosphorylation sites in a given protein 
(see phosphat.mpimp-golm.mpg.de). Queries can be run based 
on Arabidopsis gene identifiers (AGI coded) or based on pep- 
tide sequences or protein annotation text queries. The advanced 
search possibilities allow users to include meta-information from 
experimental context (tissue type, experimental treatment, etc.). 
Both, for queries of experimental sites as well as for queries of 
phosphorylation site predictions, multiple AGI codes can be sub- 
mitted (see phosphat.mpimp-golm.mpg.de). Query results will 
then be displayed on a multipage result window, sorted by gene 
identifiers. 

Upon selecting one of the protein identifiers, followed by a 
peptide, the protein prediction tab becomes activated and upon 
clicking displays a detailed protein view tab. The top right corner 
of this protein tab contains links to various other resources: SUBA, 
TAIR, ATTED, Aramemnon, and GabiPD. Below the protein ID, 
its functional description and the MapMan bin classification, the 
middle part of the protein tab is allocated to the phosphorylation 
site predictor. Here the amino acids from experimentally identi- 
fied peptides are underlined, and predicted phosphorylated amino 
acids are marked with a green background. Amino acids that 
were experimentally confirmed to be phosphorylated are shown 
in bold, and hovering with the mouse over one of those will dis- 
play the details for this identification or prediction just below the 
protein sequence. Positive score values indicate positive predic- 
tion, while increasing value indicates increasing probability of 
phosphorylation. Predicted Pfam domain structures are mapped 
onto the protein sequence and displayed in a yellow background, 
allowing the user to put the experimental and predicted phospho- 
rylation sites in functional context (Durek et al., 2010). Below the 
sequence display, a list of experimentally identified phosphopep- 
tides is available with icons signifying MS spectrum availability 
and quantitative information. 

In the list of experimental data, phosphorylation sites are 
marked as defined if the precise location of the phosphorylated 
amino acid has been unambiguously determined by mass spectro- 
metric analysis. Clear identification of the phosphorylated amino 
acid in the phosphopeptides often requires manual interpretation 
of mass spectra and use of additional fragment ion scoring algo- 
rithms (Olsen etal, 2006; MacLean etal, 2008). These defined 
sites in PhosPhAt are marked with brackets and a lowercase p, such 
as (pS), (pT), (pY). Phosphorylation sites marked as ambiguous 
were not clearly resolved by the mass spectrometric experiments. 
These sites are marked as lowercase letters in brackets, e.g., (s), 
(t), (y). The undefined sites are usually putatively phosphory- 
lated amino acids in close proximity. In PhosPhAt, the remark 
"site undetermined" on the modified tryptic peptide is used to 
mark those situations where no statement could be made on the 
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Functional categorisation of phosphorylated proteins 
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FIGURE 1 | Distribution of experimental and predicted phosphorylation 
sites to functional categories of MapMan.The bins which group encoded 
proteins in their functional categories are: photosynthesis (PS)-1 ; major 
carbohydrate (CHO) metabolism-2; minor CHO metabolism-3; glycolysis-4; 
fermentation-5; gluconeogenese/glyoxylate cycle-6; OPP-7;TCA/org. 
transformation- 8; mitochondrial electron transport/ATP synthesis -9; cell 
wall-10; lipid metabolism-11 ; N-metabolism-12; amino acid metabolism-13; 



S-assimilation-14; metal handling-15; secondary metabolism-16; 
hormone metabolism-17; co-factor and vitamin metabolism-18: 
tetrapyrrole synthesis-19; stress-20; redox-21; polyamine metabolism-22; 
nucleotide metabolism-23; biodegradation of xenobiotics-24; 
C1-metabolism-25; misc-26; RNA-27; DNA-28; protein-29; signaling-30; 
cell-31; microRNA, natural antisense etc-32; development-33; transport-34; 
not assigned-35. 
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PhosPhAt spectrum for SV(pS)TPFMNTTAK 
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FIGURE 2 | Example spectra from PhosPhAt and selected transitions for quantification in targeted SRM analyses for NIA2. 



location of the phosphorylation site based on the mass spectrum 
(Heazlewood etal., 2008). 

Upon double-clicking on a peptide-row in the list of exper- 
imentally validated phosphopeptides, the fragment spectrum of 
this ion, experimental origin, and available quantitative informa- 
tion are displayed. The annotated ions in the fragment spectrum 
are indicated by blue bars, and clicking one of them displays 
the fragment-specific information. The mass list of each par- 
ticular peptide ion can be exported as peak list (.csv format). 
Also at the level of primary query result, custom information 
can be exported as tab delimited tables, Mascot compatible .mgf 
format, or in Motif-X format (Schwartz and Gygi, 2005). In 
all view pages, information displayed can be custom-adjusted 
by clicking on the column title and selecting desired informa- 
tion for display. A complete tab-delimited table of all database 
contents can be downloaded from the PhosPhAt main page 
(phosphat.mpimp-golm.mpg.de). 

USING PhosPhAt RESOURCE FOR DESIGN OF 
TARGETED EXPERIMENTS 

The experimental and predicted phosphorylation sites available in 
PhosPhAt and particularly the spectra deposited in the database 
provide a valuable resource for targeted in-depth analysis of the 
role of protein phosphorylation in physiological contexts. 

The use of fragment spectrum libraries for the design 
of targeted analyses has previously been described in detail 



(Gillet etal., 2012). Examples for targeted analysis of metabolic 
pathways are already available from yeast, providing a detailed 
dynamic proteome profile of the glycolytic pathway or microbial 
proteomes (Carroll etal., 2011; Schmidt etal., 2011). However, 
the combination of targeted protein analysis with monitor- 
ing of phosphorylation stoichiometry has not yet been widely 
applied in plant science. The commonly used methods for 
targeted protein quantification can also be well applied to 
phosphopeptides (Johnson etal., 2009) and synthetic standard 
(phospho) -peptides can be used to determine phosphorylation 
stoichiometry (Steen etal., 2005). Both approaches require reli- 
able information of phosphopeptide identity and fragmentation 
properties. 

The starting point for a targeted phosphorylation site analy- 
sis is a limited set of proteins of interest. The query will return 
experimentally identified phosphorylation sites from the desired 
proteins, and ideally an experimentally acquired fragment spec- 
trum is hosted in the PhosPhAt database. By clicking on the 
peptide-rows, the individual spectra can be assessed, and a com- 
bined export of the peak list is available from the first query result 
page. The information from PhosPhAt may be complemented by 
additional literature information about the biological relevance of 
particular phosphorylation sites. 

In an example, we are interested in studying phosphorylation 
stoichiometry of proteins involved in nitrogen uptake and assim- 
ilation. A query of ammonium and nitrate transporters as well 
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as nitrate reductase reveals spectral information for nitrate reduc- 
tase (AT1G37130.1), a nitrate transporter NRT2.1 (AT1G08090), 
and an ammonium transporter AMT 1.1 (AT4G13510.1) among 
others. We have selected peptides found in phosphorylated 
and non-phosphorylated form for each one of these proteins. 
They are: SV(pS)TPFMNTTAK/SVSTPFMNTTAK for nitrate 
reductase; EQSFAFSVQ(pS)PIVHTDK/EQSFAFSVQSPIVHTDK 
for nitrate transporter NRT2.1, and ISSEDEMAGMDM(pT)R/ 
ISSEDEMAGMDMTR for ammonium transporter AMT 1.1. 
Independent studies show that phosphorylation of S534 in nitrate 
reductase (Kaiser and Huber, 2001) is involved in the activity 
regulation of nitrate reductase NIA2 and especially in its inter- 
action with 14-3-3 proteins (Kaiser and Huber, 2001; Lillo etal., 
2004). Conveniently this region is covered by the experimentally 
identified phosphopeptide. To our knowledge, there is no pre- 
cise information about the function of the phosphorylation site 
of the NRT2.1 transporter, although the level of phosphorylation 
of this peptide has been found to change upon nitrate re-supply 
to starved seedlings (Engelsberger and Schulze, 2011). For the 
ammonium transporter peptide it has been shown that it is subject 
to inactivation by C-terminal phosphorylation at the threonine 
residue in the experimentally confirmed phosphopeptide (Yuan 
et al, 2007). Thus, on one hand among these three proteins we have 
clear examples of experimentally verified phosphopeptides, where 
phosphorylation has been shown to influence protein activity and 
can be used for diagnostics purposes in various mutants. On the 



other hand there are novel phosphopeptides, like the NRT2.1 pep- 
tide, where we know that the level of phosphorylation changes but 
we are not sure yet how this influences the protein itself. Follow- 
ing the choice of target peptides, a selected reaction monitoring 
method is designed as described (Lange etal., 2008) and applied 
to mutant or wildtype plants subjected to various treatments. 
When selecting the transitions to be monitored for each pep- 
tide, we could use the annotated fragment spectra available from 
PhosPhAt, and select a number of reliable ions that can be repro- 
ducibly monitored, as in the example of NIA2 shown in Figure 2. 
PhosPhAt therefore also serves as a phosphopeptide library 
resource. 

SUMMARY 

The PhosPhAt database was initiated to provide a resource that 
consolidates our current knowledge of mass spectrometry-based 
identified phosphorylation sites in the model plant Arabidop- 
sis. It is combined with a phosphorylation site prediction tool 
specifically trained on plant type phosphorylation motifs. Thus, 
PhosPhAt not only serves as a searchable knowledge base for 
experimentally identified phosphorylation sites, but also provides 
a powerful resource for the characterization and annotation of 
yet unidentified phosphorylation sites in plant proteins. Further- 
more, the stored spectra for large numbers of phosphorylation 
sites provide a direct resource for the design of additional targeted 
experiments. 
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